continual learner
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > China (0.04)
- Workflow (0.68)
- Research Report > New Finding (0.46)
Parabolic Continual Learning
Yang, Haoming, Hasan, Ali, Tarokh, Vahid
Regularizing continual learning techniques is important for anticipating algorithmic behavior under new realizations of data. We introduce a new approach to continual learning by imposing the properties of a parabolic partial differential equation (PDE) to regularize the expected behavior of the loss over time. This class of parabolic PDEs has a number of favorable properties that allow us to analyze the error incurred through forgetting and the error induced through generalization. Specifically, we do this through imposing boundary conditions where the boundary is given by a memory buffer. By using the memory buffer as a boundary, we can enforce long term dependencies by bounding the expected error by the boundary loss. Finally, we illustrate the empirical performance of the method on a series of continual learning tasks.
Learning Mamba as a Continual Learner
Continual learning (CL) aims to efficiently learn and accumulate knowledge from a data stream with different distributions. By formulating CL as a sequence prediction task, meta-continual learning (MCL) enables to meta-learn an efficient continual learner based on the recent advanced sequence models, e.g., Transformers. Although attention-free models (e.g., Linear Transformers) can ideally match CL's essential objective and efficiency requirements, they usually perform not well in MCL. Considering that the attention-free Mamba achieves excellent performances matching Transformers' on general sequence modeling tasks, in this paper, we aim to answer a question - Can attention-free Mamba perform well on MCL? By formulating Mamba with a selective state space model (SSM) for MCL tasks, we propose to meta-learn Mamba as a continual learner, referred to as MambaCL. By incorporating a selectivity regularization, we can effectively train MambaCL. Through comprehensive experiments across various CL tasks, we also explore how Mamba and other models perform in different MCL scenarios. Our experiments and analyses highlight the promising performance and generalization capabilities of Mamba in MCL. Continual learning (CL) aims to efficiently learn and accumulate knowledge in a non-stationary data stream (De Lange et al., 2021; Wang et al., 2024) containing different tasks. To ensure computational and memory efficiency, CL methods are explored for learning from data streams while minimizing the storage of historical data or limiting running memory growth, such as restricting the increase rate to be constant or sub-linear (De Lange et al., 2021; Ostapenko et al., 2021). D. Gong is the corresponding author. The data stream can also be seen as a context of the tasks for performing prediction for a new query.
- North America > Canada > Ontario > Toronto (0.14)
- Oceania > Australia > New South Wales (0.04)
- North America > United States > California (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Theoretical Insights into Overparameterized Models in Multi-Task and Replay-Based Continual Learning
Banayeeanzade, Mohammadamin, Soltanolkotabi, Mahdi, Rostami, Mohammad
Multi-task learning (MTL) is a machine learning paradigm that aims to improve the generalization performance of a model on multiple related tasks by training it simultaneously on those tasks. Unlike MTL, where the model has instant access to the training data of all tasks, continual learning (CL) involves adapting to new sequentially arriving tasks over time without forgetting the previously acquired knowledge. Despite the wide practical adoption of CL and MTL and extensive literature on both areas, there remains a gap in the theoretical understanding of these methods when used with overparameterized models such as deep neural networks. This paper studies the overparameterized linear models as a proxy for more complex models. We develop theoretical results describing the effect of various system parameters on the model's performance in an MTL setup. Specifically, we study the impact of model size, dataset size, and task similarity on the generalization error and knowledge transfer. Additionally, we present theoretical results to characterize the performance of replay-based CL models. Our results reveal the impact of buffer size and model capacity on the forgetting rate in a CL setup and help shed light on some of the state-of-the-art CL methods. Finally, through extensive empirical evaluations, we demonstrate that our theoretical findings are also applicable to deep neural networks, offering valuable guidance for designing MTL and CL models in practice.
- North America > United States > California (0.14)
- Asia > Middle East > Jordan (0.04)
Backdoor Attack in Prompt-Based Continual Learning
Nguyen, Trang, Tran, Anh, Ho, Nhat
The adaptability of human learning to absorb new knowledge without forgetting previously acquired information remains a significant challenge for machine learning models. Continual learning (CL) endeavors to narrow this chasm by guiding models to sequentially learn new tasks while maintaining high performance on earlier ones. An outstanding solution to CL is the prompt-based approach [45, 57, 58, 55, 40], which leverages the power of pre-trained models and employs a set of trainable prompts for flexible model instruction, accommodating data from various tasks. Thanks to its ability to remember without storing a memory buffer, prompt-based CL methods are particularly suitable for scenarios prioritizing data privacy, such as those involving multiple data suppliers. Nonetheless, such promising results can inadvertently become vulnerabilities, exposing CL to security threats. Indeed, while CL methods effectively address catastrophic forgetting by preserving and incorporating previously acquired knowledge, they may also unwittingly retain knowledge compromised by adversarial actions. These threats become even more formidable in the multi-data supplier scenario of prompt-based approaches, where the supplied data might contain hidden harmful information. One potential threat is backdoor attack, which manipulates neural networks to exhibit the attacker's desired behavior when the input contains a specific backdoor trigger.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California (0.04)
- Asia > Nepal (0.04)
- Research Report (1.00)
- Overview (0.67)
Hard ASH: Sparsity and the right optimizer make a continual learner
In class incremental learning, neural networks typically suffer from catastrophic forgetting. We show that an MLP featuring a sparse activation function and an adaptive learning rate optimizer can compete with established regularization techniques in the Split-MNIST task. We highlight the effectiveness of the Adaptive SwisH (ASH) activation function in this context and introduce a novel variant, Hard Adaptive SwisH (Hard ASH) to further enhance the learning retention. Continual learning presents a unique challenge for artificial neural networks, particularly in the class incremental setting (Hsu et al., 2019), where a single network must remember old classes that have left the training set. In this paper I explore an overlooked approach that doesn't require any techniques developed specifically for continual learning.